Search CORE

344 research outputs found

Semantic Workflow Representation and Samples

Author: Cantalupo Barbara
Giammarino Ludovico
Matskanis N.
Silvestri Fabrizio Silvestri
Surridge M.
Publication venue
Publication date: 01/01/2005
Field of study

Query-driven document partitioning and collection selection

Author: Domenico Laforenza
Fabrizio Silvestri
Publication venue
Publication date: 01/01/2006
Field of study

Abstract — We present a novel strategy to partition a document collection onto several servers and to perform effective collection selection. The method is based on the analysis of query logs. We proposed a novel document representation called query-vectors model. Each document is represented as a list recording the queries for which the document itself is a match, along with their ranks. To both partition the collection and build the collection selection function, we co-cluster queries and documents. The document clusters are then assigned to the underlying IR servers, while the query clusters represent queries that return similar results, and are used for collection selection. We show that this document partition strategy greatly boosts the performance of standard collection selection algorithms, including CORI, w.r.t. a round-robin assignment. Secondly, we show that performing collection selection by matching the query to the existing query clusters and successively choosing only one server, we reach an average precision-at-5 up to 1.74 and we constantly improve CORI precision of a factor between 11 % and 15%. As a side result we show a way to select rarely asked-for documents. Separating these documents from the rest of the collection allows the indexer to produce a more compact index containing only relevant documents that are likely to be requested in the future. In our tests, around 52 % of the documents (3,128,366) are not returned among the first 100 top-ranked results of any query. I

CiteSeerX

A look at the Hubble speed from first principles

Author: Renzi Fabrizio
Silvestri Alessandra
Publication venue
Publication date: 20/11/2020
Field of study

We introduce a novel way of measuring

H_0

from a combination of independent geometrical datasets, namely Supernovae, Baryon Acoustic Oscillations and Cosmic Chronometers, without the need of calibration nor of the choice of a cosmological model. Our method builds on the \emph{distance duality relation} which sets the ratio of luminosity and angular diameter distances to a fixed scaling with redshift, for any metric theory of gravity with standard photon propagation. In our analysis of the data we employ Gaussian Process algorithms to obtain constraints that are independent from the underlying cosmological model. We find

H_0=69.5\pm1.7

Km/s/Mpc, showing that it is possible to constrain

H_0

with an accuracy of

2\%

with minimal assumptions. While competitive with current astrophysical and cosmological constraints, our result is not precise enough to solve the Hubble tension in a definitive way. However, we uncover some interesting features that hint at a twofold solution of the tension.Comment: 7 pages, 5 figures. Any comments are mostly welcom

arXiv.org e-Print Archive

Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking

Author: Haines Andrew
Lalmas Mounia
Silvestri Fabrizio
Tolomei Gabriele
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Machine-learned models are often described as "black boxes". In many real-world applications however, models may have to sacrifice predictive power in favour of human-interpretability. When this is the case, feature engineering becomes a crucial task, which requires significant and time-consuming human effort. Whilst some features are inherently static, representing properties that cannot be influenced (e.g., the age of an individual), others capture characteristics that could be adjusted (e.g., the daily amount of carbohydrates taken). Nonetheless, once a model is learned from the data, each prediction it makes on new instances is irreversible - assuming every instance to be a static point located in the chosen feature space. There are many circumstances however where it is important to understand (i) why a model outputs a certain prediction on a given instance, (ii) which adjustable features of that instance should be modified, and finally (iii) how to alter such a prediction when the mutated instance is input back to the model. In this paper, we present a technique that exploits the internals of a tree-based ensemble classifier to offer recommendations for transforming true negative instances into positively predicted ones. We demonstrate the validity of our approach using an online advertising application. First, we design a Random Forest classifier that effectively separates between two types of ads: low (negative) and high (positive) quality ads (instances). Then, we introduce an algorithm that provides recommendations that aim to transform a low quality ad (negative instance) into a high quality one (positive instance). Finally, we evaluate our approach on a subset of the active inventory of a large ad network, Yahoo Gemini.Comment: 10 pages, KDD 201

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Archivio istituzionale della ricerca - Università di Padova

Efficient Diversification of Web Search Results

Author: Capannini Gabriele
Nardini Franco Maria
Perego Raffaele
Silvestri Fabrizio
Publication venue
Publication date: 01/01/2011
Field of study

In this paper we analyze the efficiency of various search results diversification methods. While efficacy of diversification approaches has been deeply investigated in the past, response time and scalability issues have been rarely addressed. A unified framework for studying performance and feasibility of result diversification solutions is thus proposed. First we define a new methodology for detecting when, and how, query results need to be diversified. To this purpose, we rely on the concept of "query refinement" to estimate the probability of a query to be ambiguous. Then, relying on this novel ambiguity detection method, we deploy and compare on a standard test set, three different diversification methods: IASelect, xQuAD, and OptSelect. While the first two are recent state-of-the-art proposals, the latter is an original algorithm introduced in this paper. We evaluate both the efficiency and the effectiveness of our approach against its competitors by using the standard TREC Web diversification track testbed. Results shown that OptSelect is able to run two orders of magnitude faster than the two other state-of-the-art approaches and to obtain comparable figures in diversification effectiveness.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Tour recommendation for groups

Author: Anagnostopoulos Aris
Atassi Reem
Becchetti Luca
Fazzone Adriano
Silvestri Fabrizio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Consider a group of people who are visiting a major touristic city, such as NY, Paris, or Rome. It is reasonable to assume that each member of the group has his or her own interests or preferences about places to visit, which in general may differ from those of other members. Still, people almost always want to hang out together and so the following question naturally arises: What is the best tour that the group could perform together in the city? This problem underpins several challenges, ranging from understanding people’s expected attitudes towards potential points of interest, to modeling and providing good and viable solutions. Formulating this problem is challenging because of multiple competing objectives. For example, making the entire group as happy as possible in general conflicts with the objective that no member becomes disappointed. In this paper, we address the algorithmic implications of the above problem, by providing various formulations that take into account the overall group as well as the individual satisfaction and the length of the tour. We then study the computational complexity of these formulations, we provide effective and efficient practical algorithms, and, finally, we evaluate them on datasets constructed from real city data

Archivio della ricerca- Università di Roma La Sapienza

Community Membership Hiding as Counterfactual Graph Search via Deep Reinforcement Learning

Author: Bernini Andrea
Silvestri Fabrizio
Tolomei Gabriele
Publication venue
Publication date: 13/10/2023
Field of study

Community detection techniques are useful tools for social media platforms to discover tightly connected groups of users who share common interests. However, this functionality often comes at the expense of potentially exposing individuals to privacy breaches by inadvertently revealing their tastes or preferences. Therefore, some users may wish to safeguard their anonymity and opt out of community detection for various reasons, such as affiliation with political or religious organizations. In this study, we address the challenge of community membership hiding, which involves strategically altering the structural properties of a network graph to prevent one or more nodes from being identified by a given community detection algorithm. We tackle this problem by formulating it as a constrained counterfactual graph objective, and we solve it via deep reinforcement learning. We validate the effectiveness of our method through two distinct tasks: node and community deception. Extensive experiments show that our approach overall outperforms existing baselines in both tasks

arXiv.org e-Print Archive

Misspelling Oblivious Word Embeddings

Author: Bojanowski Piotr
Edizel Bora
Ferreira Rui
Grave Edouard
Piktus Aleksandra
Silvestri Fabrizio
Publication venue
Publication date: 01/01/2019
Field of study

In this paper we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. We train these embeddings on a new dataset we are releasing publicly. Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.Comment: 9 Page

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Sheaf Neural Networks for Graph-based Recommender Systems

Author: Cassarà Giulia
Liò Pietro
Purificato Antonio
Silvestri Fabrizio
Publication venue
Publication date: 07/04/2023
Field of study

Recent progress in Graph Neural Networks has resulted in wide adoption by many applications, including recommendation systems. The reason for Graph Neural Networks' superiority over other approaches is that many problems in recommendation systems can be naturally modeled as graphs, where nodes can be either users or items and edges represent preference relationships. In current Graph Neural Network approaches, nodes are represented with a static vector learned at training time. This static vector might only be suitable to capture some of the nuances of users or items they define. To overcome this limitation, we propose using a recently proposed model inspired by category theory: Sheaf Neural Networks. Sheaf Neural Networks, and its connected Laplacian, can address the previous problem by associating every node (and edge) with a vector space instead than a single vector. The vector space representation is richer and allows picking the proper representation at inference time. This approach can be generalized for different related tasks on graphs and achieves state-of-the-art performance in terms of F1-Score@N in collaborative filtering and Hits@20 in link prediction. For collaborative filtering, the approach is evaluated on the MovieLens 100K with a 5.1% improvement, on MovieLens 1M with a 5.4% improvement and on Book-Crossing with a 2.8% improvement, while for link prediction on the ogbl-ddi dataset with a 1.6% refinement with respect to the respective baselines.Comment: 9 pages, 7 figure

arXiv.org e-Print Archive